Project Intro Logo.png

Open in Github View In Colab

  • EDA

  • Import & Analyse the data.

  • Check for Incomplete Information

  • Target Class Distribution

  • The target class distribution is heavily imbalanced as most calls are assinged to Group 0 and exluding this as well, we find an imabalanced dataset for the rest of the groups.

  • Choosing a Metric to benchmark model performance

As we want to be able to classify the tickets into all functional groups and functional groups are given equal importance, we choose AUC as the final metric to score model performance.

  • Outlier Analysis

  • Most descriptions have between 6 and 28 words long with median at 41 (106 characters) and mean at 27.2 with relatively few outliers ranging till 1625 words!
  • Most Short descriptions have between 4 and 9 words long with median at 6 (41 characters) and mean at 6.92 with relatively few outliers ranging till 28 words.

  • Fix Text Encoding

  • Word Frequency Distributions & WordClouds

  • Stopwords and Anchor words like 'From:', 'Recieved' have to be stripped out
  • Many stopwords are occuring most frequently in the dataset. We might need to use stopword removal in our pre-processing if it improves the model performance.

  • Descriptions WordCloud

  • Short Descriptions WordCloud

  • Group 0 Descriptions WordCloud

  • Other Groups Descriptions WordCloud

  • Short Descriptions

  • Descriptions

  • Description Lengths vs. Functional Group

  • Language Detection

  • Pre-Processing

Pipeline

  • Outage Questionnaires

  • Security/Event Logs

  • Clean up caller ids in description

  • Clean Irrelevant Information

  • Clean Anchors

  • Parse Emails

  • Gibberish Removal

  • Utility functions for Generic Pre-Processing

  • Language Translation

Detect Foreign Languages in dataset and perform machine translation backed by Hugging Face models. The quality of machine translation via cloud services has come a very long way and produces high quality results. This notebook shows how the models from Hugging Face give developers a reasonable alternative for local machine translation.